Robust Text Analysis via Underspecification

نویسنده

  • Frank Schilder
چکیده

This paper is concerned with the robust analysis of the discourse structure of a text via underspecification. Most current discourse theories (e.g. Rhetorical Structure Theory (RST) by Mann and Thompson (1988), Abduction by Hobbs et al. (1993) or Segmented Discourse Representation Theory (SDRT) by Asher (1993)) require detailed world and context knowledge for the derivation of the discourse structure. A discourse structure for a given text has to be obtained in every case. For an ambiguous discourse a high number of structures may be generated. The present approach instead derives an underspecified discourse structure for text based on a limited set of discourse cues. Only when evidence for a discourse relation or a set of discourse relations is given, for example, via a discourse marker is the discourse structure further specified. After providing background information on underspecification and SDRT, a general framework of an underspecified discourse grammar is outlined. This framework captures scope ambiguities of discourse relations, introduces to the SDRT representation the underspecification of the discourse relation that links two segments, and further specifies the content of an abstract topic node that dominates a segment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shallow Parsing and Text Chunking: a View on Underspecification in Syntax

This paper illustrates a technique of shallow parsing named “text chunking” whereby “parse incompleteness” is reinterpreted as “parse underspecification”. A text is chunked into structured units which can be identified with certainty on the basis of available knowledge. The chunking process stops at that level of granularity beyond which the analysis gets undecidable. We argue that a chunked sy...

متن کامل

A Semantic Explication of Information Status and the Underspecification of the Recipients’ Knowledge

This article presents a survey of and an investigation into the notion of information status. Based on insights from DRT and presupposition theory a new variant of IS taxonomis is developed, considering issues such as accommodation and underspecification of text with regard to hearer knowledge.

متن کامل

Towards a Robust Deep Language Understanding System

We propose a system that bridges the gap between the two major approaches toward natural language processing: robust shallow text processing and domain-specific (often linguistically-based) deep understanding. We propose to use an existing linguistically motivated deep understanding system as the core and to leverage statistical techniques and external resources such as world knowledge to broad...

متن کامل

A Cascaded Finite-State Parser for German

The paper presents two approaches to partial parsing of German: a tagger trained on dependency tuples, and a cascaded finite-state parser (Abney, 1997). For the tagging approach, the effects of choosing different representations of dependency tuples are investigated. Performance of the finite-state parser is boosted by delaying syntactically unsolvable disambiguation problems via underspecifica...

متن کامل

A Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine

Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000